Take_Home_Ex03_Updated

Author

Wong Kelly

Modified

March 20, 2023

1. Overview

Housing is an essential component of household wealth worldwide. Buying a housing has always been a major investment for most people. The price of housing is affected by many factors. Some of them are global in nature such as the general economy of a country or inflation rate. Others can be more specific to the properties themselves. These factors can be further divided to structural and locational factors. Structural factors are variables related to the property themselves such as the size, fitting, and tenure of the property. Locational factors are variables related to the neighbourhood of the properties such as proximity to childcare centre, public transport service and shopping centre.

Conventional, housing resale prices predictive models were built by using Ordinary Least Square (OLS) method. However, this method failed to take into consideration that spatial autocorrelation and spatial heterogeneity exist in geographic data sets such as housing transactions. With the existence of spatial autocorrelation, the OLS estimation of predictive housing resale pricing models could lead to biased, inconsistent, or inefficient results (Anselin 1998). In view of this limitation, Geographical Weighted Models were introduced for calibrating predictive model for housing resale prices.

1.1 The Task

In this take-home exercise, we are tasked to predict HDB resale prices at the sub-market level (i.e. HDB 3-room, HDB 4-room and HDB 5-room) for the month of January and February 2023 in Singapore. The predictive models must be built by using by using conventional OLS method and GWR methods. We are also required to compare the performance of the conventional OLS method versus the geographical weighted methods.

2. Data Acquisition Source

For the purpose of this take-home exercise, HDB Resale Flat Prices provided by Data.gov.sg should be used as the core data set. The study should focus on either three-room, four-room or five-room flat and transaction period should be from 1st January 2021 to 31st December 2022. The test data should be January and February 2023 resale prices.

In addition, we will also include other locational factors such as proximity of HDB to eldercare services, and shopping malls etc for considerations.

Data Summary Table

Type Name Format Source
Aspatial HDB resale flat prices .csv data.gov.sg
Geospatial Master plan 2014 subzone web boundary .shp data.gov.sg
Geospatial (Locational factor) Elder care services .shp data.gov.sg
Geospatial (Locational factor) Hawker centres .geojson data.gov.sg
Geospatial (Locational factor) MRT stations .shp
Geospatial (Locational factor) Supermarkets .geojson data.gov.sg
Geospatial (Locational factor) Student care services .geojson data.gov.sg
Geospatial (Locational factor) Shopping Malls .csv https://github.com/ValaryLim/Mall-Coordinates-Web-Scraper
Geospatial (Locational factor) Bus Stops .shp data.gov.sg
Geospatial (Locational factor) Dengue clusters .geojson data.gov.sg
Geospatial (Locational factor) Parks .kml data.gov.sg
Geospatial (Locational factor) Kindergartens .shp data.gov.sg

3. Getting Started

3.1 Installing and Loading the R packages

pacman::p_load(olsrr, sf, sfdep, GWmodel, tmap, tidyverse, gtsummary, SpatialML,rsample,Metrics, jsonlite,httr,rvest,sp)

The R packages installed that we will be using for take-home assignment 3 are:

  • sf: used for importing, managing, and processing geospatial data

  • tmap: used for creating thematic maps, such as choropleth and bubble maps

  • tidyverse: a collection of packages for data science tasks

  • sfdep: An interface for ‘spdep’ to integrate with ‘sf’ objects and the ‘tidyverse’

  • plotly: used for creating interactive and dynamic visualisations in R

  • olsrr: designed for use with ordinary least squares (OLS) regression

  • GWmodel: (TO CONTINUEEEE)

4. Data Wrangling: Geospatial Data & Aspatial Data

4.1 Importing Aspatial Data

resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")

From the results above, we can tell that:

  • The dataset contains 11 columns with 148,000 rows in total.

  • The timeframe of the dataset is from 2017 January to 2023 February up to date (from the review of dataset).

  • The columns that are present in the data are: month, town, flat_type, block, street_name, storey_range, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, resale_price (from the review of dataset).

In this take-home assignment, I selected HDB 4-room flat resale prices to analyse during the transaction period from 1st January 2021 to 31st December 2022. Therefore, we will need to filter and only extract data during this period of time frame.

4.1.1 Filter Resale Data

As mentioned in the previous section of 4.1, we are only interested in (1) HDB 4-room flat resale prices during the period of (2) 1st January 2021 to 31st December 2022. Let’s filter them!

rs_subset <-  filter(resale,flat_type == "4 ROOM") %>% 
              filter(month >= "2021-01" & month <= "2022-12")

From the results above, we can tell that:

  • We have successfully filtered our data based on earlier chosen HDB model flat and transaction period!

  • From January 2021 to December 2022, there are 23,657 transactions for 4-room flat in Singapore.

Additionally, we will need to extract another data set for our testing model which is between the period of January 2023 to February 2023.

rs_subset_test <-  filter(resale,flat_type == "4 ROOM") %>% 
              filter(month >= "2023-01" & month <= "2023-02")

From the results above, we can tell that:

  • We have successfully filtered our test data as well!

  • From January 2023 to February 2023, there are 1848 transactions for 4-room flat in Singapore.

4.1.2 Transform Resale Data

After we have extracted the rows of transactions we are interested in, we will then proceed to use mutate function of dplyr package to create new variables (columns) in a data frame by applying some transformations to the existing columns.

What we will need to do is:

  • address: concatenation of the block and street_name columns using paste() function of base R package.

  • remaining_lease_yr & remaining_lease_mnth: Split the year and months part of the remaining_lease respectively using str_sub() function of stringr package then converting the character to integer using as.integer() function of base R package.

  • After performing mutate function, we will store the new data in rs_transform.

rs_transform <- rs_subset %>%
  mutate(rs_subset, address = paste(block,street_name)) %>%
  mutate(rs_subset, remaining_lease_yr = as.integer(str_sub(remaining_lease, 0, 2))) %>%
  mutate(rs_subset, remaining_lease_mnth = as.integer(str_sub(remaining_lease, 9, 11)))

After we have successfully added the three variables (address, remaining_lease_yr, and remaining_lease_mnth) into a new data named rs_transform, we will see some NA values in the remaining_lease_mnth column. Therefore, we will need to replace those with a value of 0 using is.na() function of base R package.

rs_transform$remaining_lease_mnth[is.na(rs_transform$remaining_lease_mnth)] <- 0
rs_transform
# A tibble: 23,657 × 14
   month   town    flat_…¹ block stree…² store…³ floor…⁴ flat_…⁵ lease…⁶ remai…⁷
   <chr>   <chr>   <chr>   <chr> <chr>   <chr>     <dbl> <chr>     <dbl> <chr>  
 1 2021-01 ANG MO… 4 ROOM  547   ANG MO… 04 TO …      92 New Ge…    1981 59 yea…
 2 2021-01 ANG MO… 4 ROOM  414   ANG MO… 01 TO …      92 New Ge…    1979 57 yea…
 3 2021-01 ANG MO… 4 ROOM  509   ANG MO… 01 TO …      91 New Ge…    1980 58 yea…
 4 2021-01 ANG MO… 4 ROOM  467   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
 5 2021-01 ANG MO… 4 ROOM  571   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
 6 2021-01 ANG MO… 4 ROOM  134   ANG MO… 07 TO …      98 New Ge…    1978 56 yea…
 7 2021-01 ANG MO… 4 ROOM  204   ANG MO… 07 TO …      92 New Ge…    1977 55 yea…
 8 2021-01 ANG MO… 4 ROOM  429   ANG MO… 04 TO …      92 New Ge…    1978 56 yea…
 9 2021-01 ANG MO… 4 ROOM  413   ANG MO… 10 TO …      92 New Ge…    1979 57 yea…
10 2021-01 ANG MO… 4 ROOM  415   ANG MO… 07 TO …      92 New Ge…    1979 57 yea…
# … with 23,647 more rows, 4 more variables: resale_price <dbl>, address <chr>,
#   remaining_lease_yr <int>, remaining_lease_mnth <dbl>, and abbreviated
#   variable names ¹​flat_type, ²​street_name, ³​storey_range, ⁴​floor_area_sqm,
#   ⁵​flat_model, ⁶​lease_commence_date, ⁷​remaining_lease

Now, as we scroll to the remaining_lease_mnth column, we noticed all initial “NA” values have been replaced by 0!

Next, we do not want to segregate the remaining lease in years and months columns. Instead, we could convert the remaining_lease_yr to months unit and create a new column call total_remaining_lease for easier analysis later using mutate function of dplyr package which contains the summation of the remaining_lease_yr and remaining_lease_mnth using rowSum() function of base R package. Here is how we do it!

# Multiply remaining_lease_yr column in months unit
rs_transform$remaining_lease_yr <- rs_transform$remaining_lease_yr * 12

# Create a new column: total_remaining_lease to contain the summation of yr and mnth
rs_transform <- rs_transform %>% 
  mutate(rs_transform, total_remaining_lease = rowSums(rs_transform[, c("remaining_lease_yr", "remaining_lease_mnth")])) %>%
  select(month, town, address, block, street_name, flat_type, storey_range, floor_area_sqm, flat_model, 
         lease_commence_date, total_remaining_lease, resale_price)

# Display head of data
head(rs_transform)
# A tibble: 6 × 12
  month   town     address block stree…¹ flat_…² store…³ floor…⁴ flat_…⁵ lease…⁶
  <chr>   <chr>    <chr>   <chr> <chr>   <chr>   <chr>     <dbl> <chr>     <dbl>
1 2021-01 ANG MO … 547 AN… 547   ANG MO… 4 ROOM  04 TO …      92 New Ge…    1981
2 2021-01 ANG MO … 414 AN… 414   ANG MO… 4 ROOM  01 TO …      92 New Ge…    1979
3 2021-01 ANG MO … 509 AN… 509   ANG MO… 4 ROOM  01 TO …      91 New Ge…    1980
4 2021-01 ANG MO … 467 AN… 467   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
5 2021-01 ANG MO … 571 AN… 571   ANG MO… 4 ROOM  07 TO …      92 New Ge…    1979
6 2021-01 ANG MO … 134 AN… 134   ANG MO… 4 ROOM  07 TO …      98 New Ge…    1978
# … with 2 more variables: total_remaining_lease <dbl>, resale_price <dbl>, and
#   abbreviated variable names ¹​street_name, ²​flat_type, ³​storey_range,
#   ⁴​floor_area_sqm, ⁵​flat_model, ⁶​lease_commence_date

Upon inspection of the rs_transform, we now only left with one column: total_remaining_lease that contains all the remaining lease in months!

4.1.3 Retrieve Postal Codes and Coordinates of Addresses

In this section, we will focus on retrieving the relevant data like postal codes and coordinates of the address which is required to get the proximity to locational factors in the later parts.

Here are the steps to add its longitude and latitude features with OneMapSG API!

Step 1: Create a list storing unique addresses

add_list <- sort(unique(rs_transform$address))

Step 2: Create function to retrieve coordinates from OneMapSG API

get_coords <- function(add_list){
  
  # Create a data frame to store all retrieved coordinates
  postal_coords <- data.frame()
    
  for (i in add_list){

    r <- GET('https://developers.onemap.sg/commonapi/search?',
           query=list(searchVal=i,
                     returnGeom='Y',
                     getAddrDetails='Y'))
    
    # Send a GET request to OneMap API with address as searchVal,
    # returnGeom as 'Y' to retrieve the coordinates, and getAddrDetails as 'Y' to retrieve the postal code

    
    data <- fromJSON(rawToChar(r$content))
    found <- data$found
    res <- data$results
    
    # Extract the 'found' and 'results' fields from the API reponses
    
    # Create a new data frame for each address
    new_row <- data.frame()
    
    # If single result, append 
    if (found == 1){
      postal <- res$POSTAL 
      lat <- res$LATITUDE
      lng <- res$LONGITUDE
      new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
    }
    
    # If multiple results, drop NIL and append top 1
    else if (found > 1){
      # Remove those with NIL as postal
      res_sub <- res[res$POSTAL != "NIL", ]
      
      # Set as NA first if no Postal
      if (nrow(res_sub) == 0) {
          new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
      }
      
      else{
        top1 <- head(res_sub, n = 1)
        postal <- top1$POSTAL 
        lat <- top1$LATITUDE
        lng <- top1$LONGITUDE
        new_row <- data.frame(address= i, postal = postal, latitude = lat, longitude = lng)
      }
    }

    else {
      new_row <- data.frame(address= i, postal = NA, latitude = NA, longitude = NA)
    }
    
    # Add the row
    postal_coords <- rbind(postal_coords, new_row)
  }
  return(postal_coords)
}

Step 3: Call get_coords function to retrieve resale coordinates

coords <- get_coords(add_list)

4.1.4 Combine Resale and Coordinates Data

After we have done retrieving the location coordinates of all the resale HDBs, we need to now combine our resale data (rs_transform) earlier with the coordinates data (coords) using left_join() function.

rs_coords <- left_join(rs_transform, coords, by = c('address' = 'address'))

Great! We have successfully joined the two data sets and now let’s write the file to our rds folder!

rs_coords_rds <- write_rds(rs_coords, "data/rds/rs_coords.rds")

Now, let’s read rs_coords RDS file:

rs_coords <- read_rds("data/rds/rs_coords.rds")

4.1.5 Assign and Transform CRS and Check

The coordinate columns (latitude, longitude) are currently in decimal degrees, the projected CRS will be WGS84. We will need to convert it into a spatial data frame with projected coordinates of 3414.

rs_coords_sf <- st_as_sf(rs_coords,
                    coords = c("longitude", 
                               "latitude"),
                    crs=4326) %>%
  st_transform(crs = 3414)
st_crs(rs_coords_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

4.1.5.1 Check for Invalid Geometries

length(which(st_is_valid(rs_coords_sf) == FALSE))
[1] 0

We have no invalid geometries! Now, let’s plot hdb resale points

tmap_mode("view")
tm_shape(rs_coords_sf)+
  tm_dots(col="red", size = 0.02)
tmap_mode("plot")

5. Import Geospatial Locational Factors Data (WITH geographic coordinates)

5.1 Import ALL Locational Factors Data Sets

In this section, we will read and process all the locational factors data as they are important in determining the HDB resale prices as we believe good location with good amenities will have a higher resale price and vice versa.

Firstly, we will need to read and check CRS of all locational factors!

#geojson files
hawker_sf <- st_read("data/geospatial_locational_GC/hawker-centres/hawker-centres-geojson.geojson")
Reading layer `hawker-centres-geojson' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\hawker-centres\hawker-centres-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449217
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
supermarket_sf <- st_read("data/geospatial_locational_GC/supermarkets/supermarkets-geojson.geojson")
Reading layer `supermarkets-geojson' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\supermarkets\supermarkets-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 526 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6258 ymin: 1.24715 xmax: 104.0036 ymax: 1.461526
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
student_sf <- st_read("data/geospatial_locational_GC/student-care-services/student-care-services-geojson.geojson")
Reading layer `student-care-services-geojson' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\student-care-services\student-care-services-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 425 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6877 ymin: 1.27474 xmax: 103.963 ymax: 1.456667
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
dengue_sf <- st_read("data/geospatial_locational_GC/dengue-clusters/dengue-clusters-geojson.geojson")
Reading layer `dengue-clusters-geojson' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\dengue-clusters\dengue-clusters-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 15 features and 2 fields
Geometry type: POLYGON
Dimension:     XYZ
Bounding box:  xmin: 103.7058 ymin: 1.275541 xmax: 103.9008 ymax: 1.383897
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
train_sf <- st_read("data/geospatial_locational_GC/mrtstation/lta-mrt-station-exit-geojson.geojson")
Reading layer `lta-mrt-station-exit-geojson' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\mrtstation\lta-mrt-station-exit-geojson.geojson' 
  using driver `GeoJSON'
Simple feature collection with 474 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6368 ymin: 1.264972 xmax: 103.9893 ymax: 1.449157
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84
#shp files
elder_sf <- st_read(dsn = "data/geospatial_locational_GC/eldercare-services", layer="ELDERCARE")
Reading layer `ELDERCARE' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\eldercare-services' 
  using driver `ESRI Shapefile'
Simple feature collection with 133 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21
bus_sf <- st_read(dsn = "data/geospatial_locational_GC/BusStopLocation", layer="BusStop")
Reading layer `BusStop' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\BusStopLocation' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
kindergarten_sf <- st_read(dsn = "data/geospatial_locational_GC/kindergartens", layer="KINDERGARTENS")
Reading layer `KINDERGARTENS' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\kindergartens' 
  using driver `ESRI Shapefile'
Simple feature collection with 448 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 11909.7 ymin: 25596.33 xmax: 43395.47 ymax: 48562.06
Projected CRS: SVY21
#csv file
mall_sf <- read_csv("data/geospatial_locational_GC/Mall-Coordinates/mall_coordinates_updated.csv")

#kml file
park_sf <- st_read("data/geospatial_locational_GC/parks/parks.kml")
Reading layer `NATIONALPARKS_New' from data source 
  `C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex03\data\geospatial_locational_GC\parks\parks.kml' 
  using driver `KML'
Simple feature collection with 421 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6929 ymin: 1.214491 xmax: 104.0538 ymax: 1.462094
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

From the results above, we can see that the datasets are all in different CRS:

  • The datasets with WGS84 are:

    • hawker_sf, supermarket_sf, student_sf, dengue_sf, park_sf,train_sf

    • As the EPSG code are in 4326 which is the appropriate EPSG code for WGS84, we will only need to transform the CRS later on

  • The datasets with SVY21 are:

    • elder_sf, train_sf, bus_sf, kindergarten_sf

    • For all of these datasets with SVY21 ???

  • The dataset read in csv????

5.2 Transform all Data to CRS EPSG 3414

Transform Data to EPSG 3414

elder_sf <- st_set_crs(elder_sf, 3414)
train_sf <- st_set_crs(train_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
kindergarten_sf <- st_set_crs(kindergarten_sf, 3414)

hawker_sf <- hawker_sf %>%
  st_transform(crs = 3414)
supermarket_sf <- supermarket_sf %>%
  st_transform(crs = 3414)
student_sf <- student_sf %>%
  st_transform(crs = 3414)
dengue_sf <- dengue_sf %>%
  st_transform(crs = 3414)
park_sf <- park_sf %>%
  st_transform(crs = 3414)

st_crs

st_crs(elder_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(train_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(kindergarten_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(hawker_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(supermarket_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(student_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(dengue_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(park_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

From the above results, we can see that the EPSG code of all the data has now been assigned correctly and they are all EPSG 3414.

5.2.1 Check for Invalid Geometries

Since all the datasets above have been converted to the appropraite EPSG, we should also check for any invalid geometries to avoid any issues when calculating proximity or plot the map.

length(which(st_is_valid(elder_sf) == FALSE))
[1] 0
length(which(st_is_valid(train_sf) == FALSE))
[1] 0
length(which(st_is_valid(bus_sf) == FALSE))
[1] 0
length(which(st_is_valid(kindergarten_sf) == FALSE))
[1] 0
length(which(st_is_valid(hawker_sf) == FALSE))
[1] 0
length(which(st_is_valid(supermarket_sf) == FALSE))
[1] 0
length(which(st_is_valid(student_sf) == FALSE))
[1] 0
length(which(st_is_valid(dengue_sf) == FALSE))
[1] 0
length(which(st_is_valid(park_sf) == FALSE))
[1] 0

From the results above, we can see that there are no invalid geometries for all the locational factors! That means we can move on to calculate proximity.

5.3 Calculate Proximity

5.3.1 Create get_prox Function to Calculate Proximity

get_prox <- function(df1, df2, varname){
  
  # creates a matrix of distances
  dist_matrix <- st_distance(df1, df2)           
  
  # find the nearest location_factor and create new data frame
  near <- df1 %>% 
    mutate(PROX = apply(dist_matrix, 1, function(x) min(x)) / 1000) 
  
  # rename column name according to input parameter
  names(near)[names(near) == 'PROX'] <- varname

  # Return df
  return(near)
}

5.3.1 Call Get_Prox Function

rs_coords_sf <- get_prox(rs_coords_sf, elder_sf, "PROX_ELDERLYCARE") 
rs_coords_sf <- get_prox(rs_coords_sf, train_sf, "PROX_TRAIN") 
rs_coords_sf <- get_prox(rs_coords_sf, bus_sf, "PROX_BUS") 
rs_coords_sf <- get_prox(rs_coords_sf, kindergarten_sf, "PROX_KINDERGARTEN") 
rs_coords_sf <- get_prox(rs_coords_sf, hawker_sf, "PROX_HAWKER")
rs_coords_sf <- get_prox(rs_coords_sf, supermarket_sf, "PROX_SUPERMARKET")
rs_coords_sf <- get_prox(rs_coords_sf, student_sf, "PROX_STUDENT")
rs_coords_sf <- get_prox(rs_coords_sf, dengue_sf, "PROX_DENGUE")
rs_coords_sf <- get_prox(rs_coords_sf, park_sf, "PROX_PARK")

5.3.1 Create get_within Function to Calculate factors that are within declared distance

get_within <- function(df1, df2, threshold_dist, varname){
  
  # creates a matrix of distances
  dist_matrix <- st_distance(df1, df2)   
  
  # count the number of location_factors within threshold_dist and create new data frame
  wdist <- df1 %>% 
    mutate(WITHIN_DT = apply(dist_matrix, 1, function(x) sum(x <= threshold_dist)))
  
  # rename column name according to input parameter
  names(wdist)[names(wdist) == 'WITHIN_DT'] <- varname

  # Return df
  return(wdist)
}

5.3.2 Call get_within Function

rs_coords_sf <- get_within(rs_coords_sf, kindergarten_sf, 350, "WITHIN_350M_KINDERGARTEN")
rs_coords_sf <- get_within(rs_coords_sf, student_sf, 350, "WITHIN_350M_CHILDCARE")
rs_coords_sf <- get_within(rs_coords_sf, bus_sf, 350, "WITHIN_350M_BUS")

6. Import Geospatial Locational Factors Data (WITHOUT geographic coordinates)

6.2 CBD

6.3 Shopping Malls

6.4 Primary Schools

6.5 Good Primary Schools